Dynamic Patterns of Public Transport Usage

Author

Wan Kee

Published

November 25, 2023

A Geospatial Analysis of Bus Stop Passenger Volume in Urban Environments

1.1 Overview

In the intricate mosaic of urban transportation, bus stops serve as pivotal nodes that capture the pulse of city life through the ebb and flow of passenger trips. The study of passenger trip traffic, particularly during peak hours, becomes essential for enhancing service efficiency, planning urban infrastructure, and improving the overall commuter experience. This geospatial analysis is anchored in the bustling landscape of a metropolitan area, where data on bus stops, resident population distribution, urban development plans (master plan sub-zones), and passenger volume intertwine to paint a comprehensive picture of transit dynamics.

This analysis aims to dissect the rhythms of urban mobility with the following objectives:

  1. Geovisualization and Analysis

    • Compute the passenger trips generated by origin at the hexagon level
    • Visualize the geographical distribution of the passenger trips
  2. Local Indicators of Spatial Association (LISA) Analysis

    • Compute LISA of the passengers trips generate by origin at hexagon level
    • Visualize the LISA maps of the passengers trips generate by origin at hexagon level
  3. Emerging Hot Spot Analysis (EHSA)

    • Perform Mann-Kendall Test by using the spatio-temporal local Gi* values
    • Visualize EHSA maps of the Gi* values of the passenger trips by origin at the hexagon level

1.2 Load packages

sf imports geospatial data and readr imports csv file.

tidyverse performs data import, wrangling and visualization. dplyr perform relational join.

spdep compute spatial weights and calculate spatially lagged variables.

pacman::p_load(sf, readr, dplyr, ggplot2, knitr, mapview, spdep, tmap, tidyverse)

1.3 Import data

We will be using the following geospatial (busstops, mpsz) and aspatial (odbus, pop) datasets:

busstops contains the detailed information for all bus stops currently serviced by buses, including bus stop code, road name, description, location coordinates.

Source: LTA DataMall (Postman URL)

busstops <- st_read(dsn = "data/BusStopLocation_Jul2023", layer = "BusStop") %>%
  st_transform(crs = 3414)
Reading layer `BusStop' from data source 
  `/Users/chockwankee/Documents/chockwk/ISSS624_Geospatial_Analytics/Take_home_Ex/Take_home_Ex01/data/BusStopLocation_Jul2023' 
  using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21

The output indicates that the geospatial objects are point features. There are 5161 features and 3 fields. It is in SVY21 projected coordinates system with XY dimension.

odbus contains the number of trips by weekdays and weekends from origin to destination bus stops. It reflects the passenger trip traffic for October 2023.

Source: LTA DataMall (Postman URL)

odbus = read_csv("data/origin_destination_bus_202310.csv")

The output does not indicate any geospatial objects. There are 5,694,297 records and 7 fields.

pop contains Singapore residents grouped by planning Area or subzone, age group, sex and floor area of residence. The data period is from June 2011 onwards. From June 2011 to 2019, the planning areas refer to areas demarcated in the Master Plan 2014, and from June 2020 onwards will be Master Plan 2019.

x Description
PA Planning area
SZ Subzone
AG Age group
Sex Sex
FA Residence floor area
Pop Resident count
Time Time/period

Source: Department of Statistics (Link)

pop <- read_csv("data/respopagesexfa2011to2020/respopagesexfa2011to2020.csv")

mpsz is the Master Plan 2019, a forward looking guiding plan for Singapore’s development in the medium term over the next 10 to 15 years published in 2019. Note this mpsz differs from that in previous chapter, Data Wrangling.

Source: URA (Download here)

mpsz = st_read(dsn="data/MPSZ-2019", layer="MPSZ-2019") %>% 
  st_transform(crs=3414)
Reading layer `MPSZ-2019' from data source 
  `/Users/chockwankee/Documents/chockwk/ISSS624_Geospatial_Analytics/Take_home_Ex/Take_home_Ex01/data/MPSZ-2019' 
  using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84

The output indicates that the geospatial objects are multipolygon features. There are 332 features and 6 fields. It is in WGS84 projected coordinates system with XY dimension.

# Assuming mpsz is your sf object
#mpsz <- st_read(dsn="data/MP19_SUBZONE", layer="MP19_SUBZONE_NO_SEA")

# Ensure that 'geometry' is the active geometry column
#mpsz <- st_set_geometry(mpsz, "geometry")

# Check for invalid geometries and apply st_make_valid if needed
#mpsz$geometry_is_valid <- st_is_valid(mpsz$geometry)
#mpsz <- mpsz %>%
#  mutate(geometry = ifelse(!geometry_is_valid, st_make_valid(geometry), geometry))

# Now, remove the Z dimension explicitly if it exists
#mpsz <- st_zm(mpsz, drop = TRUE, what = "Z")

# View the data structure to confirm changes
#print(mpsz1)
Warning

Warning: GDAL Message 1: Sub-geometry 0 has coordinate dimension 2, but container has 3

The warning message indicate that a discrepancy in the dimensionality of the geometries in mpsz Shapefile. Some of the sub-geometries have 2D coordinates (X and Y), while the overall container expects 3D coordinates (X, Y, and Z). The st_zm(what = "Z") function drops the Z dimension from the geometries, which eliminate the warnings about coordinate dimensions. We are not performing 3D analysis, this approach should work fine for most applications.

1.4 Create Spatial Grids

Spatial grids are commonly used in spatial analysis to divide the study area into equal size, regular polygons that tessellate the area of interest. On the selection of grid, regular geographic units, such as square grid or fishnets, rarely reflect real world situations. Hexagons are compact shape and can overcome oddly-shaped geographical units.

Step 1: Create a hexgonal grid

hexagon is a hexagon layer of 250m where the distance is the perpendicular distance between the centre of the hexagon and its edges. It is a substitute for ‘mpsz’ which is relatively coarse and irregular.

Step 1

# Create hexagonal grid (250m from center to edges) and add grid ID.
area_hexagon_grid = st_make_grid(busstops, cellsize = 500, what = "polygons", square = FALSE)

hexagon_grid_sf = st_sf(area_hexagon_grid) %>% 
  mutate(grid_id = 1:length(lengths(area_hexagon_grid)))
hexagon_grid_sf
Simple feature collection with 5580 features and 1 field
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 3470.122 ymin: 26193.43 xmax: 48720.12 ymax: 53184.55
Projected CRS: SVY21 / Singapore TM
First 10 features:
                area_hexagon_grid grid_id
1  POLYGON ((3720.122 26626.44...       1
2  POLYGON ((3720.122 27492.46...       2
3  POLYGON ((3720.122 28358.49...       3
4  POLYGON ((3720.122 29224.51...       4
5  POLYGON ((3720.122 30090.54...       5
6  POLYGON ((3720.122 30956.57...       6
7  POLYGON ((3720.122 31822.59...       7
8  POLYGON ((3720.122 32688.62...       8
9  POLYGON ((3720.122 33554.64...       9
10 POLYGON ((3720.122 34420.67...      10

The output indicates that the geospatial objects are polygon features. There are 5580 features and 1 fields. It is in SVY21 projected coordinates system with XY dimension.

Step 2

# Count the number of bus stops in each grid and remove grids without any value.
hexagon_grid_sf$grid_id = lengths(st_intersects(hexagon_grid_sf, busstops))
hexagon_count = filter(hexagon_grid_sf, grid_id > 0)
hexagon_count
Simple feature collection with 1524 features and 1 field
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 3720.122 ymin: 26193.43 xmax: 48720.12 ymax: 53184.55
Projected CRS: SVY21 / Singapore TM
First 10 features:
                area_hexagon_grid grid_id
1  POLYGON ((3970.122 27925.48...       1
2  POLYGON ((4220.122 28358.49...       1
3  POLYGON ((4470.122 30523.55...       1
4  POLYGON ((4720.122 28358.49...       1
5  POLYGON ((4720.122 30090.54...       2
6  POLYGON ((4720.122 30956.57...       1
7  POLYGON ((4720.122 31822.59...       1
8  POLYGON ((4970.122 28791.5,...       1
9  POLYGON ((4970.122 29657.53...       1
10 POLYGON ((4970.122 30523.55...       2

The output indicates that the geospatial objects retained polygon features and there are more than one polygon feature in a grid_id. The intersection of busstops and hexagon_grid_sf is 1524 features and 1 fields, indicating only 1524 out of 5580 features contains bus stops. It is in SVY21 projected coordinates system with XY dimension.

Step 3

# Set tmap to view mode for interactive plotting.
tmap_mode("view")
hexagon = tm_shape(hexagon_count)+
  tm_fill(
    col = "grid_id",
    palette = "Reds",
    style = "cont",
    title = "Number of Bus Stops in Singapore",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.6,
    popup.format = list(
      grid_id = list(format = "f", digits = 0)
    )
  )+
  tm_borders(col = "grey40", lwd = 0.7)
hexagon

The geospatial distribution of bus stops in Singapore, as visualized in the above map, is extensive and dispersed throughout the central region, with a notable concentration of stops with darker shades of red signifying higher concentrations. Outlying regions, along the coastal areas, show fewer bus stops as indicated by the presence of lighter-colored hexagons or even white spaces where no stops are present. This distribution pattern suggests a robust public transportation infrastructure in urban and densely populated areas, tapering off in less populated or industrial regions.

1.5 Explore data

st_geometry() displays basic information of the feature class, such as type of geometry, the geographic extent of the features and the coordinate system of the data. glimpse() transposes the columns in a dataset and makes it possible to see the column name, data type and values in every column in a data frame.

st_geometry(busstops)
Geometry set for 5161 features 
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21 / Singapore TM
First 5 geometries:
glimpse(busstops)
Rows: 5,161
Columns: 4
$ BUS_STOP_N <chr> "22069", "32071", "44331", "96081", "11561", "66191", "2338…
$ BUS_ROOF_N <chr> "B06", "B23", "B01", "B05", "B05", "B03", "B02A", "B02", "B…
$ LOC_DESC   <chr> "OPP CEVA LOGISTICS", "AFT TRACK 13", "BLK 239", "GRACE IND…
$ geometry   <POINT [m]> POINT (13576.31 32883.65), POINT (13228.59 44206.38),…
odbus$ORIGIN_PT_CODE <- as.factor(odbus$ORIGIN_PT_CODE)
odbus$DESTINATION_PT_CODE <- as.factor(odbus$DESTINATION_PT_CODE)
glimpse(odbus)
Rows: 5,694,297
Columns: 7
$ YEAR_MONTH          <chr> "2023-10", "2023-10", "2023-10", "2023-10", "2023-…
$ DAY_TYPE            <chr> "WEEKENDS/HOLIDAY", "WEEKDAY", "WEEKENDS/HOLIDAY",…
$ TIME_PER_HOUR       <dbl> 16, 16, 14, 14, 17, 17, 17, 7, 14, 14, 10, 20, 20,…
$ PT_TYPE             <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE      <fct> 04168, 04168, 80119, 80119, 44069, 20281, 20281, 1…
$ DESTINATION_PT_CODE <fct> 10051, 10051, 90079, 90079, 17229, 20141, 20141, 1…
$ TOTAL_TRIPS         <dbl> 3, 5, 3, 5, 4, 1, 24, 2, 1, 7, 3, 2, 5, 1, 1, 1, 1…
glimpse(pop)
Rows: 738,492
Columns: 7
$ PA   <chr> "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio", "Ang Mo K…
$ SZ   <chr> "Ang Mo Kio Town Centre", "Ang Mo Kio Town Centre", "Ang Mo Kio T…
$ AG   <chr> "0_to_4", "0_to_4", "0_to_4", "0_to_4", "0_to_4", "0_to_4", "0_to…
$ Sex  <chr> "Males", "Males", "Males", "Males", "Males", "Males", "Females", …
$ FA   <chr> "<= 60", ">60 to 80", ">80 to 100", ">100 to 120", ">120", "Not A…
$ Pop  <dbl> 0, 10, 30, 80, 20, 0, 0, 10, 40, 90, 10, 0, 0, 10, 30, 110, 30, 0…
$ Time <dbl> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011,…
glimpse(mpsz)
Rows: 332
Columns: 7
$ SUBZONE_N  <chr> "MARINA EAST", "INSTITUTION HILL", "ROBERTSON QUAY", "JURON…
$ SUBZONE_C  <chr> "MESZ01", "RVSZ05", "SRSZ01", "WISZ01", "MUSZ02", "MPSZ05",…
$ PLN_AREA_N <chr> "MARINA EAST", "RIVER VALLEY", "SINGAPORE RIVER", "WESTERN …
$ PLN_AREA_C <chr> "ME", "RV", "SR", "WI", "MU", "MP", "WI", "WI", "SI", "SI",…
$ REGION_N   <chr> "CENTRAL REGION", "CENTRAL REGION", "CENTRAL REGION", "WEST…
$ REGION_C   <chr> "CR", "CR", "CR", "WR", "CR", "CR", "WR", "WR", "CR", "CR",…
$ geometry   <MULTIPOLYGON [m]> MULTIPOLYGON (((33222.98 29..., MULTIPOLYGON (…

1.5 Plot data

mapview creates interactive visualisations of spatial data to examine and visually investigate both aspects of spatial data, the geometries and their attributes.

plot() takes parameters for specifying points in the diagram. At its simplest, it can plot two numbers against each other. With datasets, it can generate maps and plot the specified columns/attributes, with default up to nine plots or maximum all plots.

mapview(busstops, cex = 3, alpha = 0.5, popup = NULL)
# Filter hexagons that contain more than 8 bus stops
hexagon_red = filter(hexagon_grid_sf, grid_id>8)

tmap_mode("view")
redhex = tm_shape(hexagon_red)+
  tm_fill(
    col = "grid_id",
    palette = "Reds",
    style = "cont",
    title = "Number of Bus Stops",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.6,
    popup.format = list(
      grid_id = list(format = "f", digits = 0)
    )
  )+
  tm_borders(col = "grey40", lwd = 0.7)
redhex

Sembawang MRT (9 bus stops)

Pasir Ris (11 bus stops)

# Summarize the total trips by day type
total_trips <- odbus %>%
  group_by(DAY_TYPE) %>%
  summarise(total = sum(TOTAL_TRIPS))

# Plot using ggplot2
ggplot(total_trips, aes(x = DAY_TYPE, y = total, fill = DAY_TYPE)) +
  geom_bar(stat = "identity") +
  theme_minimal() +
  labs(title = "Total Trips by Day Type", x = "Day Type", y = "Total Trips")

popdata2020 <- pop %>%
  filter(Time == 2020) %>%
  group_by(PA, SZ, AG) %>%
  summarise(`POP` = sum(`Pop`)) %>%
  ungroup() %>%
  pivot_wider(names_from=AG, values_from=POP) %>%
  mutate(YOUNG = rowSums(.[3:6]) + rowSums(.[12])) %>%
  mutate(`ECONOMY ACTIVE` = rowSums(.[7:11]) + rowSums(.[13:15]))  %>%
  mutate(`AGED`=rowSums(.[16:21])) %>%
  mutate(`TOTAL`=rowSums(.[3:21])) %>%
  mutate(`DEPENDENCY` = (`YOUNG` + `AGED`)/`ECONOMY ACTIVE`) %>%
  select(`PA`, `SZ`, `YOUNG`, `ECONOMY ACTIVE`, `AGED`, `TOTAL`, `DEPENDENCY`)
plot(mpsz["PLN_AREA_N"])

1.6 Geovisualisation and Analysis

1.6.2 Compute the passenger trips generated by origin

Step 1:

Classify by time interval

# Function to assign peak and non-peak times
time_interval <- function(day_type, time_per_hour) {
  if (day_type == "WEEKDAY") {
    if (time_per_hour >= 6 & time_per_hour <= 9) {
      "morning peak"
    } else if (time_per_hour >= 17 & time_per_hour <= 20) {
      "afternoon peak"
    } else {
      "non peak"
    }
  } else if (day_type == "WEEKENDS/HOLIDAY") {
    if (time_per_hour >= 11 & time_per_hour <= 14) {
      "morning peak"
    } else if (time_per_hour >= 16 & time_per_hour <= 19) {
      "evening peak"
    } else {
      "non peak"
    }
  } else {
    "non peak"
  }
}

# Assuming 'odbus' is your data frame
odbus$TIME <- mapply(time_interval, odbus$DAY_TYPE, odbus$TIME_PER_HOUR)

# Checking the first few rows of the data frame to verify the new column
head(odbus)
# A tibble: 6 × 8
  YEAR_MONTH DAY_TYPE         TIME_PER_H…¹ PT_TYPE ORIGI…² DESTI…³ TOTAL…⁴ TIME 
  <chr>      <chr>                   <dbl> <chr>   <fct>   <fct>     <dbl> <chr>
1 2023-10    WEEKENDS/HOLIDAY           16 BUS     04168   10051         3 even…
2 2023-10    WEEKDAY                    16 BUS     04168   10051         5 non …
3 2023-10    WEEKENDS/HOLIDAY           14 BUS     80119   90079         3 morn…
4 2023-10    WEEKDAY                    14 BUS     80119   90079         5 non …
5 2023-10    WEEKDAY                    17 BUS     44069   17229         4 afte…
6 2023-10    WEEKENDS/HOLIDAY           17 BUS     20281   20141         1 even…
# … with abbreviated variable names ¹​TIME_PER_HOUR, ²​ORIGIN_PT_CODE,
#   ³​DESTINATION_PT_CODE, ⁴​TOTAL_TRIPS

Step 2:

passengertrips <- left_join(busstops, odbus, 
                            by = c("BUS_STOP_N" = "ORIGIN_PT_CODE"))

Determine weekday or weekend

passengertrips_day <- passengertrips %>%
  group_by(BUS_STOP_N, BUS_ROOF_N, LOC_DESC, YEAR_MONTH, geometry) %>%
  summarise(
    WEEKDAY_TRIPS = sum(TOTAL_TRIPS[DAY_TYPE == "WEEKDAY"], na.rm = TRUE),
    WEEKENDS_HOLIDAYS_TRIPS = sum(TOTAL_TRIPS[DAY_TYPE != "WEEKDAY"], na.rm = TRUE),
    .groups = "drop"
  )

Timing of Day

Determine peak or non peak

passengertrips_time <- passengertrips %>%
  group_by(BUS_STOP_N, BUS_ROOF_N, LOC_DESC, YEAR_MONTH, geometry) %>%
  summarise(
    WEEKDAY_MORNING_PEAK = sum(TOTAL_TRIPS[DAY_TYPE == "WEEKDAY" & TIME == "morning peak"], na.rm = TRUE),
    WEEKDAY_AFTERNOON_PEAK = sum(TOTAL_TRIPS[DAY_TYPE == "WEEKDAY" & TIME == "afternoon peak"], na.rm = TRUE),
    WEEKDAY_NON_PEAK = sum(TOTAL_TRIPS[DAY_TYPE == "WEEKDAY" & TIME == "non peak"], na.rm = TRUE),
    WEEKENDS_HOLIDAYS_MORNING_PEAK = sum(TOTAL_TRIPS[DAY_TYPE != "WEEKDAY" & TIME == "morning peak"], na.rm = TRUE),
    WEEKENDS_HOLIDAYS_EVENING_PEAK = sum(TOTAL_TRIPS[DAY_TYPE != "WEEKDAY" & TIME == "evening peak"], na.rm = TRUE),
    WEEKENDS_HOLIDAYS_NON_PEAK = sum(TOTAL_TRIPS[DAY_TYPE != "WEEKDAY" & TIME == "non peak"], na.rm = TRUE),
    .groups = "drop"
  )

passengertrips_time <- passengertrips_time %>% 
  filter(!(WEEKDAY_MORNING_PEAK == 0 
           & WEEKDAY_AFTERNOON_PEAK == 0
           & WEEKDAY_NON_PEAK == 0
           & WEEKENDS_HOLIDAYS_MORNING_PEAK == 0
           & WEEKENDS_HOLIDAYS_EVENING_PEAK == 0
           & WEEKENDS_HOLIDAYS_NON_PEAK == 0))

Step 3: Ensure both datasets are in the same coordinate reference system (CRS).

st_crs(passengertrips_day)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(passengertrips_time)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]
st_crs(hexagon_grid_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

Step 4: Perform a spatial join to match trips to hexagons

passengergrid_day <- st_join(hexagon_grid_sf, passengertrips_day, join = st_intersects)
# Remove rows with no trips (NA values)
passengergrid_day <- passengergrid_day %>% 
  filter(!is.na(BUS_STOP_N))
glimpse(passengergrid_day)
Rows: 5,161
Columns: 8
$ grid_id                 <int> 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2, 2, 2, 2, 1, 1…
$ BUS_STOP_N              <chr> "25059", "25751", "26379", "25761", "25719", "…
$ BUS_ROOF_N              <chr> "UNK", "B02D", "NIL", "B03", "B01C", "NIL", "N…
$ LOC_DESC                <chr> "AFT TUAS STH BLVD", "BEF TUAS STH AVE 14", "Y…
$ YEAR_MONTH              <chr> "2023-10", "2023-10", "2023-10", "2023-10", "2…
$ WEEKDAY_TRIPS           <dbl> 526, 468, 1055, 3420, 4584, 944, 689, 768, 644…
$ WEEKENDS_HOLIDAYS_TRIPS <dbl> 105, 96, 247, 816, 1688, 379, 248, 147, 108, 3…
$ area_hexagon_grid       <POLYGON [m]> POLYGON ((3970.122 27925.48..., POLYGO…
passengergrid_time <- st_join(hexagon_grid_sf, passengertrips_time, join = st_intersects)
# Remove rows with no trips (NA values)
passengergrid_time <- passengergrid_time %>% 
  filter(!is.na(BUS_STOP_N))
glimpse(passengergrid_time)
Rows: 5,029
Columns: 12
$ grid_id                        <int> 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2, 2, 2, …
$ BUS_STOP_N                     <chr> "25059", "25751", "26379", "25761", "25…
$ BUS_ROOF_N                     <chr> "UNK", "B02D", "NIL", "B03", "B01C", "N…
$ LOC_DESC                       <chr> "AFT TUAS STH BLVD", "BEF TUAS STH AVE …
$ YEAR_MONTH                     <chr> "2023-10", "2023-10", "2023-10", "2023-…
$ WEEKDAY_MORNING_PEAK           <dbl> 103, 52, 78, 185, 815, 301, 53, 60, 64,…
$ WEEKDAY_AFTERNOON_PEAK         <dbl> 390, 114, 291, 1905, 2600, 299, 241, 36…
$ WEEKDAY_NON_PEAK               <dbl> 33, 302, 686, 1330, 1169, 344, 395, 340…
$ WEEKENDS_HOLIDAYS_MORNING_PEAK <dbl> 0, 26, 52, 187, 367, 88, 76, 45, 21, 39…
$ WEEKENDS_HOLIDAYS_EVENING_PEAK <dbl> 56, 14, 100, 346, 533, 101, 55, 49, 53,…
$ WEEKENDS_HOLIDAYS_NON_PEAK     <dbl> 49, 56, 95, 283, 788, 190, 117, 53, 34,…
$ area_hexagon_grid              <POLYGON [m]> POLYGON ((3970.122 27925.48...,…

Step 5: Split passengers trip into weekday and weekend

::: panel-tabset

Day of Week

# Subset for Weekday
weekday_trips <- passengergrid_day %>%
  group_by(BUS_STOP_N) %>%
  summarise(
    weekday_trips = sum(WEEKDAY_TRIPS, na.rm = TRUE),
  )

# Subset for Weekend
weekend_trips <- passengergrid_day %>%
  group_by(BUS_STOP_N) %>%
  summarise(
    weekend_trips = sum(WEEKENDS_HOLIDAYS_TRIPS, na.rm = TRUE),
  )

Time of Day

# First, ensure all necessary columns are present in the dataframe
passengergrid_clean <- passengergrid_time %>%
  mutate(
    WEEKDAY_MORNING_PEAK = ifelse(is.na(WEEKDAY_MORNING_PEAK), 0, WEEKDAY_MORNING_PEAK),
    WEEKDAY_AFTERNOON_PEAK = ifelse(is.na(WEEKDAY_AFTERNOON_PEAK), 0, WEEKDAY_AFTERNOON_PEAK),
    WEEKENDS_HOLIDAYS_MORNING_PEAK = ifelse(is.na(WEEKENDS_HOLIDAYS_MORNING_PEAK), 0, WEEKENDS_HOLIDAYS_MORNING_PEAK),
    WEEKENDS_HOLIDAYS_EVENING_PEAK = ifelse(is.na(WEEKENDS_HOLIDAYS_EVENING_PEAK), 0, WEEKENDS_HOLIDAYS_EVENING_PEAK)
  )

# Function to summarise and filter bus stops with no trips
summarise_and_filter <- function(data, column) {
  data %>%
    group_by(BUS_STOP_N) %>%
    summarise(Total_Trips = sum({{ column }}, na.rm = TRUE)) %>%
    filter(Total_Trips > 0) %>%
    ungroup()
}

# Create subsets using the function
weekday_morning_peak <- summarise_and_filter(passengergrid_clean, WEEKDAY_MORNING_PEAK)

weekday_afternoon_peak <- summarise_and_filter(passengergrid_clean, WEEKDAY_AFTERNOON_PEAK)

weekend_morning_peak <- summarise_and_filter(passengergrid_clean, WEEKENDS_HOLIDAYS_MORNING_PEAK)

weekend_evening_peak <- summarise_and_filter(passengergrid_clean, WEEKENDS_HOLIDAYS_EVENING_PEAK)

# Check for any BUS_STOP_N that might still have issues
problematic_stops <- passengergrid_clean %>%
  filter(is.na(BUS_STOP_N)) %>%
  pull(BUS_STOP_N) %>%
  unique()

# If problematic_stops has any values, you may need to address these specifically,
# for example by removing them from passengergrid_clean before creating the subsets
if (length(problematic_stops) > 0) {
  passengergrid_clean <- passengergrid_clean %>%
    filter(!(BUS_STOP_N %in% problematic_stops))
}
# Convert your data to an sf object if it's not one already
weekday_sf <- st_as_sf(weekday_trips, coords = c("longitude", "latitude"), crs = 4326) 

# Calculate breaks at intervals of 25,000
max_trips <- max(weekday_sf$weekday_trips, na.rm = TRUE)
breaks <- c(1, 2500, 5000, 7500, 10000, 25000, 50000, Inf)

# Print the map
tmap_mode("view")
tm_shape(weekday_sf) +
  tm_polygons("weekday_trips", 
              palette = "Blues", 
              border.col = "grey40",
              breaks = breaks,
              title = "Weekday Trips") +
  tm_scale_bar()+
  tm_layout(main.title = "Weekday Trips by Bus Stop", 
            main.title.position = "center", 
            frame = FALSE)
# Convert your data to an sf object if it's not one already
weekday_morning_peak_sf <- st_as_sf(weekday_morning_peak, coords = c("longitude", "latitude"), crs = 4326) 

# Calculate breaks at intervals of 25,000
max_trips <- max(weekday_morning_peak_sf$Total_Trips, na.rm = TRUE)
breaks <- c(1, 2500, 5000, 7500, 10000, 25000, 50000, Inf)

# Print the map
tmap_mode("view")
tm_shape(weekday_morning_peak_sf) +
  tm_polygons("Total_Trips", 
              palette = "Blues", 
              border.col = "grey40",
              breaks = breaks,
              title = "weekday_morning_peak Trips") +
  tm_scale_bar()+
  tm_layout(main.title = "weekday_morning_peak Trips by Bus Stop", 
            main.title.position = "center", 
            frame = FALSE)
# Convert your data to an sf object if it's not one already
weekday_afternoon_peak_sf <- st_as_sf(weekday_afternoon_peak, coords = c("longitude", "latitude"), crs = 4326) 

# Calculate breaks at intervals of 25,000
max_trips <- max(weekday_afternoon_peak_sf$Total_Trips, na.rm = TRUE)
breaks <- c(1, 2500, 5000, 7500, 10000, 25000, 50000, Inf)

# Print the map
tmap_mode("view")
tm_shape(weekday_afternoon_peak_sf) +
  tm_polygons("Total_Trips", 
              palette = "Blues", 
              border.col = "grey40",
              breaks = breaks,
              title = "Weekday Afternoon Peak Trips") +
  tm_scale_bar()+
  tm_layout(main.title = "Weekday Afternoon Peak Trips by Bus Stop", 
            main.title.position = "center", 
            frame = FALSE)
# Convert your data to an sf object if it's not one already
weekend_sf <- st_as_sf(weekend_trips, coords = c("longitude", "latitude"), crs = 4326) 

# Calculate breaks at intervals of 25,000
max_trips <- max(weekend_sf$weekend_trips, na.rm = TRUE)
breaks <- c(1, 2500, 5000, 7500, 10000, 25000, 50000, Inf)

# Print the map
tmap_mode("view")
tm_shape(weekend_sf) +
  tm_polygons("weekend_trips", 
              palette = "Purples", 
              border.col = "grey40",
              breaks = breaks,
              title = "Weekend Trips") +
  tm_scale_bar()+
  tm_layout(main.title = "Weekend Trips by Bus Stop", 
            main.title.position = "center", 
            frame = FALSE)

Weekend morning peaks

# Convert your data to an sf object if it's not one already
weekend_morning_peak_sf <- st_as_sf(weekend_morning_peak, coords = c("longitude", "latitude"), crs = 4326) 

# Calculate breaks at intervals of 25,000
max_trips <- max(weekend_morning_peak_sf$Total_Trips, na.rm = TRUE)
breaks <- c(1, 2500, 5000, 7500, 10000, 25000, 50000, Inf)

# Print the map
tmap_mode("view")
tm_shape(weekend_morning_peak_sf) +
  tm_polygons("Total_Trips", 
              palette = "Purples", 
              border.col = "grey40",
              breaks = breaks,
              title = "Weekend Morning Peak Trips") +
  tm_scale_bar()+
  tm_layout(main.title = "Weekend Morning Peak Trips by Bus Stop", 
            main.title.position = "center", 
            frame = FALSE)
# Convert your data to an sf object if it's not one already
weekend_evening_peak_sf <- st_as_sf(weekend_evening_peak, coords = c("longitude", "latitude"), crs = 4326) 

# Calculate breaks at intervals of 25,000
max_trips <- max(weekend_evening_peak_sf$Total_Trips, na.rm = TRUE)
breaks <- c(1, 2500, 5000, 7500, 10000, 25000, 50000, Inf)

# Print the map
tmap_mode("view")
tm_shape(weekend_evening_peak_sf) +
  tm_polygons("Total_Trips", 
              palette = "Purples", 
              border.col = "grey40",
              breaks = breaks,
              title = "Weekend Evening Peak Trips") +
  tm_scale_bar()+
  tm_layout(main.title = "Weekend Evening Peak Trips by Bus Stop", 
            main.title.position = "center", 
            frame = FALSE)

1.6 Local Indicators of Spatial Association (LISA) Analysis

1.6.1 Compute LISA of the passengers trips generate by origin at hexagon level.

# Remove rows with NA in BUS_STOP_N and calculate TOTAL_TRIPS
passengergrid_total <- passengergrid_day %>%
  mutate(TOTAL_TRIPS = WEEKDAY_TRIPS + WEEKENDS_HOLIDAYS_TRIPS)
wm_q <- poly2nb(passengergrid_total, 
                queen=TRUE)
summary(wm_q)
Neighbour list object:
Number of regions: 5161 
Number of nonzero links: 112788 
Percentage nonzero weights: 0.4234432 
Average number of links: 21.8539 
7 regions with no links:
1767 2407 2884 3350 4627 5154 5222
Link number distribution:

  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19 
  7  12  17  27  36  89  67 103  85  71 125 135 105 140 150 139 189 191 222 153 
 20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39 
186 161 244 261 220 212 200 205 196 173 117 128 135 117 114  69  62  49  34  50 
 40  41  42  43  44  45  46  49 
 46  52  11  25  13   6   4   8 
12 least connected regions:
34 251 313 1543 3507 3507.1 5129 5159 5278 5278.1 5558 5558.1 with 1 link
8 most connected regions:
3292 3292.1 3292.2 3292.3 3292.4 3292.5 3292.6 3292.7 with 49 links
#rswm_q <- nb2listw(wm_q, 
#                   style="W")
#rswm_q
# Remove NA values from WEEKDAY_TRIPS
passengergrid_total <- passengergrid_total[!is.na(passengergrid_total$WEEKDAY_TRIPS),]

# Compute local Moran's I
#localIMI_weekday <- localmoran(passengergrid_total$WEEKDAY_TRIPS, rswm_q)

# View the results
#head(localIMI_weekday)

1.6.2 Display the LISA maps of the passengers trips generate by origin at hexagon level.

1.7 Emerging Hot Spot Analysis (EHSA)

longitude <- map_dbl(passengertrips_time$geometry, ~st_centroid(.x)[[1]])
latitude <- map_dbl(passengertrips_time$geometry, ~st_centroid(.x)[[2]])
coords <- cbind(longitude, latitude)
#coords <- coordinates(hunan)
k1 <- knn2nb(knearneigh(coords))
k1dists <- unlist(nbdists(k1, coords, longlat = TRUE))
summary(k1dists)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0    3210    4940    6217    8403   19777 
wm_d62 <- dnearneigh(coords, 0, 62, longlat = TRUE)
# wm62_lw <- nb2listw(wm_d62, style = 'W')
# summary(wm62_lw)
knn <- knn2nb(knearneigh(coords, k=8))
knn_lw <- nb2listw(knn, style = 'B')

1.7.1 Perform Mann-Kendall Test by using the spatio-temporal local Gi* values.

1.7.2 Prepared EHSA maps of the Gi* values of the passenger trips by origin at the hexagon level.

weekday evening peak passenger trips by origin; hexagons dont not have shared boundary; isolated with no neighbours but a hotspot.